# 文件目录

这篇文章介绍了如何加载一个目录中的所有文档。

底层默认使用[UnstructuredLoader](/docs/integrations/document_loaders/unstructured_file)。

```python
from langchain_community.document_loaders import DirectoryLoader
```

我们可以使用`glob`参数来控制加载哪些文件。注意这里不加载`.rst`和`.html`文件。

```python
loader = DirectoryLoader('../', glob="**/*.md")
```

```python
docs = loader.load()
```

```python
len(docs)
```

## 显示进度条

默认情况下不会显示进度条。要显示进度条，需要安装`tqdm`库（例如`pip install tqdm`），并将`show_progress`参数设置为`True`。

```python
loader = DirectoryLoader('../', glob="**/*.md", show_progress=True)
docs = loader.load()
```



```
    Requirement already satisfied: tqdm in /Users/jon/.pyenv/versions/3.9.16/envs/microbiome-app/lib/python3.9/site-packages (4.65.0)


    0it [00:00, ?it/s]
```


## 使用多线程

默认情况下，加载是在一个线程中完成的。为了利用多个线程，将`use_multithreading`标志设置为true。

```python
loader = DirectoryLoader('../', glob="**/*.md", use_multithreading=True)
docs = loader.load()
```

## 更改加载类

默认情况下，使用`UnstructuredLoader`类。不过，你可以很容易地更改加载器的类型。

```python
from langchain_community.document_loaders import TextLoader
```

```python
loader = DirectoryLoader('../', glob="**/*.md", loader_cls=TextLoader)
```

```python
docs = loader.load()
```

```python
len(docs)
```

---



```
    1
```

---



如果需要加载Python源代码文件，请使用`PythonLoader`。

```python
from langchain_community.document_loaders import PythonLoader
```

```python
loader = DirectoryLoader('../../../../../', glob="**/*.py", loader_cls=PythonLoader)
```

```python
docs = loader.load()
```

```python
len(docs)
```



```
    691
```



## 使用TextLoader自动检测文件编码

在这个例子中，我们将看到在使用`TextLoader`类加载目录中的大量任意文件时，一些有用的策略。


首先我们来说明问题，试着加载多个使用不同编码的文件。

```python
path = '../../../../../tests/integration_tests/examples'
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader)
```

### A. 默认行为

```python
loader.load()
```



```
<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="color: #800000; text-decoration-color: #800000">╭─────────────────────────────── </span><span style="color: #800000; text-decoration-color: #800000; font-weight: bold">Traceback </span><span style="color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold">(most recent call last)</span><span style="color: #800000; text-decoration-color: #800000"> ────────────────────────────────╮</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langchain/document_loaders/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">text.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">29</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">load</span>                             <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">26 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   </span>text = <span style="color: #808000; text-decoration-color: #808000">""</span>                                                                           <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">27 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">with</span> <span style="color: #00ffff; text-decoration-color: #00ffff">open</span>(<span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.file_path, encoding=<span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.encoding) <span style="color: #0000ff; text-decoration-color: #0000ff">as</span> f:                             <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">28 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">try</span>:                                                                            <span style="color: #800000; text-decoration-color: #800000">│</span>
...
...
...
```

---



`example-non-utf8.txt`文件使用了不同的编码，所以`load()`函数会失败，并给出一个有帮助的提示，指示哪个文件解码失败。

使用`TextLoader`的默认行为是，如果任何一个加载失败，整个加载过程都会失败，并且不会加载任何文档。

### B. 静默失败

我们可以将`silent_errors`参数传递给`DirectoryLoader`，以跳过无法加载的文件，并继续加载过程。

```python
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, silent_errors=True)
docs = loader.load()
```

```python
doc_sources = [doc.metadata['source']  for doc in docs]
doc_sources
```

---



```
    ['../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
     '../../../../../tests/integration_tests/examples/example-utf8.txt']
```

---



### C. 自动检测编码

我们还可以要求`TextLoader`在解码失败之前自动检测文件的编码，通过传递`autodetect_encoding`参数给加载器类。

```python
text_loader_kwargs={'autodetect_encoding': True}
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
docs = loader.load()
```

```python
doc_sources = [doc.metadata['source']  for doc in docs]
doc_sources
```

---



```
    ['../../../../../tests/integration_tests/examples/example-non-utf8.txt',
     '../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
     '../../../../../tests/integration_tests/examples/example-utf8.txt']
```

---

